This R markdown file is written to simulate the single-cell data to demonstrate if the order of the batch when applying mutual nearest neighbors (MNN) would affect the clustering results.
In this simulation study, we are going to simulate four different batches to mimic the temporal from week 0 to week 24. As time goes on, the observed cell population increases. In Figure 1, we present the idea of the simulated data. In each batch, there are different numbers of cell populations: 3, 4, 5, and 7. There exist batch effects among the cell populations 1, 2, 3, 4 and 5 across four batches.
Figure 1
To simulate the data, we are going to use Symsim package, which is one most recently published R package for simulating multiple faceted variability in single cell RND sequencing data.
## [1] 0.126397
##
## 1 2 3 4 5 6 7
## 200 300 300 300 300 300 300
## [1] 0.4763277
## [1] 0.1619887
## [1] 0.4922443
## [1] 0.1609091
## [1] 0.5029461
## [1] 0.1617658
## [1] 0.4535441
## [1] 0.1488027
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 2000
## Number of edges: 73676
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9477
## Number of communities: 14
## Elapsed time: 0 seconds
######
perm_table = permutations(n = 4, r = 4, v = 1:4)
perm_table
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 1 2 4 3
## [3,] 1 3 2 4
## [4,] 1 3 4 2
## [5,] 1 4 2 3
## [6,] 1 4 3 2
## [7,] 2 1 3 4
## [8,] 2 1 4 3
## [9,] 2 3 1 4
## [10,] 2 3 4 1
## [11,] 2 4 1 3
## [12,] 2 4 3 1
## [13,] 3 1 2 4
## [14,] 3 1 4 2
## [15,] 3 2 1 4
## [16,] 3 2 4 1
## [17,] 3 4 1 2
## [18,] 3 4 2 1
## [19,] 4 1 2 3
## [20,] 4 1 3 2
## [21,] 4 2 1 3
## [22,] 4 2 3 1
## [23,] 4 3 1 2
## [24,] 4 3 2 1
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 2000
## Number of edges: 75321
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8588
## Number of communities: 6
## Elapsed time: 0 seconds
## [1] 0.230185
## [1] 0.052
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 2000
## Number of edges: 70643
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8589
## Number of communities: 8
## Elapsed time: 0 seconds
## [1] 0.175056
## [1] 0.008
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 2000
## Number of edges: 67654
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8541
## Number of communities: 7
## Elapsed time: 0 seconds
## [1] 0.070489
## [1] 0
## [1] 0.175056
## [1] 0.008
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 2000
## Number of edges: 72338
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8511
## Number of communities: 6
## Elapsed time: 0 seconds
## [1] 0.23826
## [1] 0
## [1] 0.52
## attr(,"ci")
## lower upper
## 0.52 0.52
## [1] 0.41
## attr(,"ci")
## lower upper
## 0.41 0.41
## [1] 0.33
## attr(,"ci")
## lower upper
## 0.33 0.33
## [1] 0.47
## attr(,"ci")
## lower upper
## 0.47 0.47
Haghverdi, Laleh, et al. “Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.” Nature biotechnology 36.5 (2018): 421.
Zhang, Xiuwei, Chenling Xu, and Nir Yosef. “Simulating multiple faceted variability in single cell RNA sequencing.” Nature communications 10.1 (2019): 2611.